202 research outputs found
Detail-Preserving Pooling in Deep Networks
Most convolutional neural networks use some method for gradually downscaling
the size of the hidden layers. This is commonly referred to as pooling, and is
applied to reduce the number of parameters, improve invariance to certain
distortions, and increase the receptive field size. Since pooling by nature is
a lossy process, it is crucial that each such layer maintains the portion of
the activations that is most important for the network's discriminability. Yet,
simple maximization or averaging over blocks, max or average pooling, or plain
downsampling in the form of strided convolutions are the standard. In this
paper, we aim to leverage recent results on image downscaling for the purposes
of deep learning. Inspired by the human visual system, which focuses on local
spatial changes, we propose detail-preserving pooling (DPP), an adaptive
pooling method that magnifies spatial changes and preserves important
structural detail. Importantly, its parameters can be learned jointly with the
rest of the network. We analyze some of its theoretical properties and show its
empirical benefits on several datasets and networks, where DPP consistently
outperforms previous pooling approaches.Comment: To appear at CVPR 201
Virtual Rephotography: Novel View Prediction Error for 3D Reconstruction
The ultimate goal of many image-based modeling systems is to render
photo-realistic novel views of a scene without visible artifacts. Existing
evaluation metrics and benchmarks focus mainly on the geometric accuracy of the
reconstructed model, which is, however, a poor predictor of visual accuracy.
Furthermore, using only geometric accuracy by itself does not allow evaluating
systems that either lack a geometric scene representation or utilize coarse
proxy geometry. Examples include light field or image-based rendering systems.
We propose a unified evaluation approach based on novel view prediction error
that is able to analyze the visual quality of any method that can render novel
views from input images. One of the key advantages of this approach is that it
does not require ground truth geometry. This dramatically simplifies the
creation of test datasets and benchmarks. It also allows us to evaluate the
quality of an unknown scene during the acquisition and reconstruction process,
which is useful for acquisition planning. We evaluate our approach on a range
of methods including standard geometry-plus-texture pipelines as well as
image-based rendering techniques, compare it to existing geometry-based
benchmarks, and demonstrate its utility for a range of use cases.Comment: 10 pages, 12 figures, paper was submitted to ACM Transactions on
Graphics for revie
LR-CNN: Local-aware Region CNN for Vehicle Detection in Aerial Imagery
State-of-the-art object detection approaches such as Fast/Faster R-CNN, SSD,
or YOLO have difficulties detecting dense, small targets with arbitrary
orientation in large aerial images. The main reason is that using interpolation
to align RoI features can result in a lack of accuracy or even loss of location
information. We present the Local-aware Region Convolutional Neural Network
(LR-CNN), a novel two-stage approach for vehicle detection in aerial imagery.
We enhance translation invariance to detect dense vehicles and address the
boundary quantization issue amongst dense vehicles by aggregating the
high-precision RoIs' features. Moreover, we resample high-level semantic pooled
features, making them regain location information from the features of a
shallower convolutional block. This strengthens the local feature invariance
for the resampled features and enables detecting vehicles in an arbitrary
orientation. The local feature invariance enhances the learning ability of the
focal loss function, and the focal loss further helps to focus on the hard
examples. Taken together, our method better addresses the challenges of aerial
imagery. We evaluate our approach on several challenging datasets (VEDAI,
DOTA), demonstrating a significant improvement over state-of-the-art methods.
We demonstrate the good generalization ability of our approach on the DLR 3K
dataset.Comment: 8 page
New acquisition techniques for real objects and light sources in computer graphics
Accurate representations of objects and light sources in a scene model are a crucial prerequisite for realistic image synthesis using computer graphics techniques. This thesis presents techniques for the effcient acquisition of real world objects and real world light sources, as well as an assessment of the quality of the acquired models. Making use of color management techniques, we setup an appearance reproduction pipeline that ensures best-possible reproduction of local light reflection with the available input and output devices. We introduce a hierarchical model for the subsurface light transport in translucent objects, derive an acquisition methodology, and acquire models of several translucent objects that can be rendered interactively. Since geometry models of real world objects are often acquired using 3D range scanners, we also present a method based on the concept of modulation transfer functions to evaluate their accuracy. In order to illuminate a scene with realistic light sources, we propose a method to acquire a model of the near-field emission pattern of a light source with optical prefiltering. We apply this method to several light sources with different emission characteristics and demonstrate the integration of the acquired models into both, global illumination as well as hardware-accelerated rendering systems.Exakte Repräsentationen der Objekte und Lichtquellen in einem Modell einer
Szene sind eine unerlässliche Voraussetzung für die realistische Bilderzeugung
mit Techniken der Computergraphik. Diese Dissertation beschäftigt sich mit der
effizienten Digitalisierung von realen Objekten und realen Lichtquellen. Dabei
werden sowohl neue Digitalisierungstechniken als auch Methoden zur Bestimmung der Qualität der erzeugten Modelle vorgestellt. Wir schlagen eine Verarbeitungskette zur Digitalisierung und Wiedergabe der Farbe und Spekularität von Objekten vor, die durch Ausnutzung von Farbmanagementtechniken eine bestmögliche Wiedergabe des Objekts unter Verwendung der gegebenen Ein- und Ausgabegeräte ermöglicht. Wir führen weiterhin ein hierarchisches Modell für den Lichttransport im Inneren von Objekten aus durchscheinenden Materialien sowie eine zugehörige Akquisitionsmethode ein und digitalisieren mehrere reale Objekte. Die dabei erzeugten Modelle können in Echtzeit angezeigt werden. Die Geometrie realer Objekte spielt eine entscheidende Rolle in vielen Anwendungen und wird oftmals unter Verwendung von 3D Scannern digitalisiert. Wir entwickeln daher eine Methode zur Bestimmung der Genauigkeit eines 3D Scanners, die auf dem Konzept der Modulationstransferfunktion basiert. Um eine Szene mit realen Lichtquellen beleuchten zu können, schlagen wir ferner eine Methode zur Erfassung der Nahfeldabstrahlung eine Lichtquelle vor, bei der vor der Digitalisierung ein optischer Filterungsschritt durchgeführt wird.
Wir wenden diese Methode zur Digitalisierung mehrerer Lichtquellen mit unterschiedlichen Abstrahlcharakteristika an und zeigen auf, wie die dabei erzeugten Modelle in globalen Beleuchtungsberechnungen sowie bei der Bildsynthese mittels moderner Graphikkarten verwendet werden können
Background Subtraction with Real-time Semantic Segmentation
Accurate and fast foreground object extraction is very important for object
tracking and recognition in video surveillance. Although many background
subtraction (BGS) methods have been proposed in the recent past, it is still
regarded as a tough problem due to the variety of challenging situations that
occur in real-world scenarios. In this paper, we explore this problem from a
new perspective and propose a novel background subtraction framework with
real-time semantic segmentation (RTSS). Our proposed framework consists of two
components, a traditional BGS segmenter and a real-time semantic
segmenter . The BGS segmenter aims to construct
background models and segments foreground objects. The real-time semantic
segmenter is used to refine the foreground segmentation outputs
as feedbacks for improving the model updating accuracy. and
work in parallel on two threads. For each input frame , the
BGS segmenter computes a preliminary foreground/background
(FG/BG) mask . At the same time, the real-time semantic segmenter
extracts the object-level semantics . Then, some specific
rules are applied on and to generate the final detection
. Finally, the refined FG/BG mask is fed back to update the
background model. Comprehensive experiments evaluated on the CDnet 2014 dataset
demonstrate that our proposed method achieves state-of-the-art performance
among all unsupervised background subtraction methods while operating at
real-time, and even performs better than some deep learning based supervised
algorithms. In addition, our proposed framework is very flexible and has the
potential for generalization
Neural 3D Video Synthesis
We propose a novel approach for 3D video synthesis that is able to represent
multi-view video recordings of a dynamic real-world scene in a compact, yet
expressive representation that enables high-quality view synthesis and motion
interpolation. Our approach takes the high quality and compactness of static
neural radiance fields in a new direction: to a model-free, dynamic setting. At
the core of our approach is a novel time-conditioned neural radiance fields
that represents scene dynamics using a set of compact latent codes. To exploit
the fact that changes between adjacent frames of a video are typically small
and locally consistent, we propose two novel strategies for efficient training
of our neural network: 1) An efficient hierarchical training scheme, and 2) an
importance sampling strategy that selects the next rays for training based on
the temporal variation of the input videos. In combination, these two
strategies significantly boost the training speed, lead to fast convergence of
the training process, and enable high quality results. Our learned
representation is highly compact and able to represent a 10 second 30 FPS
multi-view video recording by 18 cameras with a model size of just 28MB. We
demonstrate that our method can render high-fidelity wide-angle novel views at
over 1K resolution, even for highly complex and dynamic scenes. We perform an
extensive qualitative and quantitative evaluation that shows that our approach
outperforms the current state of the art. We include additional video and
information at: https://neural-3d-video.github.io/Comment: Project website: https://neural-3d-video.github.io
Optical Filtering for Near Field Photometry with High Order Basis
Accurately capturing the near field emission of complex luminaires is still very difficult. In this paper, we describe a new acquisition pipeline of such luminaires that performs an orthogonal projection on a given basis in a two-step procedure. First, we use an optical low-pass filter that corresponds to the reconstruction basis to guarantee high precision measurements. The second step is a numerical process on the acquired data that finalizes the projection. Based on this concept, we introduce new experimental setups for automatic acquisition and perform a detailed error analysis of the acquisition process
LR-CNN : Local-aware Region CNN for vehicle detection in aerial imagery
State-of-the-art object detection approaches such as Fast/Faster R-CNN, SSD, or YOLO have difficulties detecting dense, small targets with arbitrary orientation in large aerial images. The main reason is that using interpolation to align RoI features can result in a lack of accuracy or even loss of location information. We present the Local-aware Region Convolutional Neural Network (LR-CNN), a novel two-stage approach for vehicle detection in aerial imagery. We enhance translation invariance to detect dense vehicles and address the boundary quantization issue amongst dense vehicles by aggregating the high-precision RoIs' features. Moreover, we resample high-level semantic pooled features, making them regain location information from the features of a shallower convolutional block. This strengthens the local feature invariance for the resampled features and enables detecting vehicles in an arbitrary orientation. The local feature invariance enhances the learning ability of the focal loss function, and the focal loss further helps to focus on the hard examples. Taken together, our method better addresses the challenges of aerial imagery. We evaluate our approach on several challenging datasets (VEDAI, DOTA), demonstrating a significant improvement over state-of-the-art methods. We demonstrate the good generalization ability of our approach on the DLR 3K dataset. © 2020 Copernicus GmbH. All rights reserved
- …